training framework
A Unified, Scalable Framework for Neural Population Decoding
Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both the model size and the datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and architecture designed to model the population dynamics of neural activity across diverse, large-scale neural recordings. Our method first tokenizes individual spikes within the dataset to build an efficient representation of neural events that captures the fine temporal structure of neural activity. We then employ cross-attention and a PerceiverIO backbone to further construct a latent tokenization of neural population activities. Utilizing this architecture and training framework, we construct a large-scale multi-session model trained on large datasets from seven nonhuman primates, spanning over 158 different sessions of recording from over 27,373 neural units and over 100 hours of recordings. In a number of different tasks, we demonstrate that our pretrained model can be rapidly adapted to new, unseen sessions with unspecified neuron correspondence, enabling few-shot performance with minimal labels. This work presents a powerful new approach for building deep learning tools to analyze neural data and stakes out a clear path to training at scale for neural decoding models.
PRO-V-R1: Reasoning Enhanced Programming Agent for RTL Verification
Zhao, Yujie, Wu, Zhijing, Yuan, Boqin, Yu, Zhongming, Zhang, Hejia, Ni, Wentao, Ho, Chia-Tung, Ren, Haoxing, Zhao, Jishen
Register-Transfer Level (RTL) verification is a primary bottleneck, consuming 60-70% of development time. While Large Language Models (LLMs) show promise for RTL automation, their performance and research focus have overwhelmingly centered on RTL generation rather than verification. Current methods for RTL verification rely on large scale proprietary models (e.g., GPT-4o) to generate Python-based functional references, incurring a high cost and raising data-privacy risks. To date, an end-to-end open-source solution for autonomous verification remains absent. We introduce PRO-V-R1, the first trainable open-source agentic framework for autonomous RTL verification. Our contributions are threefold: (1) we design PRO-V sys, a modular agentic system that couples LLM-based reasoning with programmatic tool use for RTL verification; (2) we establish a data construction pipeline that leverages existing RTL datasets to build simulation-validated, expert-level trajectories tailored for supervised fine-tuning (SFT) RTL verification agents; and (3) we implement an efficient reinforcement learning (RL) algorithm that uses verification-specific rewards derived from program-tool feedback to optimize the end-to-end verification workflow. Our empirical evaluation demonstrates PRO-V-R1 achieves a 57.7% functional correctness rate and 34.0% in robust fault detection, significantly outperforming the base model's 25.7% and 21.8% (respectively) from the state-of-the-art (SOTA) automatic verification system. This configuration also outperforms large-scale proprietary LLMs in functional correctness and shows comparable robustness for fault detection.
Co-EPG: A Framework for Co-Evolution of Planning and Grounding in Autonomous GUI Agents
Zhao, Yuan, Zhu, Hualei, Jiang, Tingyu, Li, Shen, Xu, Xiaohang, Wang, Hao Henry
Graphical User Interface (GUI) task automation constitutes a critical frontier in artificial intelligence research. While effective GUI agents synergistically integrate planning and grounding capabilities, current methodologies exhibit two fundamental limitations: (1) insufficient exploitation of cross-model synergies, and (2) over-reliance on synthetic data generation without sufficient utilization. To address these challenges, we propose Co-EPG, a self-iterative training framework for Co -E volution of P lanning and G rounding. Co-EPG establishes an iterative positive feedback loop: through this loop, the planning model explores superior strategies under grounding-based reward guidance via Group Relative Policy Optimization (GRPO), generating diverse data to optimize the grounding model. Concurrently, the optimized Grounding model provides more effective rewards for subsequent GRPO training of the planning model, fostering continuous improvement. Co-EPG thus enables iterative enhancement of agent capabilities through self-play optimization and training data distillation. On the Multimodal-Mind2Web and AndroidControl benchmarks, our framework outperforms existing state-of-the-art methods after just three iterations without requiring external data. The agent consistently improves with each iteration, demonstrating robust self-enhancement capabilities. This work establishes a novel training paradigm for GUI agents, shifting from isolated optimization to an integrated, self-driven co-evolution approach.
Mixture-of-Minds: Multi-Agent Reinforcement Learning for Table Understanding
Zhou, Yuhang, Zhang, Mingrui, Li, Ke, Wang, Mingyi, Liu, Qiao, Wang, Qifei, Liu, Jiayi, Liu, Fei, Li, Serena, Li, Weiwei, Gao, Mingze, Kumar, Abhishek, Fan, Xiangjun, Zhao, Zhuokai, Zhang, Lizhu
Understanding and reasoning over tables is a critical capability for many real-world applications. Large language models (LLMs) have shown promise on this task, but current approaches remain limited. Fine-tuning based methods strengthen language reasoning; yet they are prone to arithmetic errors and hallucination. In contrast, tool-based methods enable precise table manipulation but rely on rigid schemas and lack semantic understanding. These complementary drawbacks highlight the need for approaches that integrate robust reasoning with reliable table processing. In this work, we propose Mixture-of-Minds, a multi-agent framework that decomposes table reasoning into three specialized roles: planning, coding, and answering. This design enables each agent to focus on a specific aspect of the task while leveraging code execution for precise table manipulation. Building on this workflow, we introduce a self-improvement training framework that employs Monte Carlo Tree Search (MCTS) rollouts to generate pseudo-gold trajectories and optimize agents with reinforcement learning (RL). Extensive experiments show that Mixture-of-Minds delivers substantial gains, reaching 62.13% on TableBench and surpassing OpenAI-o4-mini-high. These results demonstrate the promise of combining structured multi-agent workflows with RL to advance table understanding.
SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
Li, Hongxing, Li, Dingming, Wang, Zixuan, Yan, Yuchen, Wu, Hang, Zhang, Wenqi, Shen, Yongliang, Lu, Weiming, Xiao, Jun, Zhuang, Yueting
Spatial reasoning remains a fundamental challenge for Vision-Language Models (VLMs), with current approaches struggling to achieve robust performance despite recent advances. We identify that this limitation stems from a critical gap: existing methods attempt to learn spatial reasoning directly without establishing the hierarchical foundations of perception and understanding. To address this challenge, we present a comprehensive methodology for building spatial intelligence progressively. We introduce SpatialLadder-26k, a multimodal dataset containing 26,610 samples spanning object localization, single image, multi-view, and video spatial reasoning tasks, constructed through a standardized pipeline that ensures systematic coverage across modalities. Building on this dataset, we design a three-stage progressive training framework that (1) establishes spatial perception through object localization, (2) develops spatial understanding through multi-dimensional spatial tasks, and (3) strengthens complex reasoning via reinforcement learning with verifiable rewards. This approach yields SpatialLadder, a 3B-parameter model that achieves state-of-the-art performance on spatial reasoning benchmarks, with 23.4% average improvement over the base model, surpassing GPT-4o by 20.8% and Gemini-2.0-Flash by 10.1%. Notably, SpatialLadder maintains strong generalization with 7.2% improvement on out-of-domain benchmarks, demonstrating that progressive training from perception to reasoning is essential for robust spatial intelligence.
Advancing Speech Summarization in Multi-modal LLMs with Reinforcement Learning
Ling, Shaoshi, Liu, Gang, Ye, Guoli, Li, Jinyu
Speech summarization is a critical component of spoken content understanding, particularly in the era of rapidly growing spoken and audiovisual data. Recent advances in multi-modal large language models (MLLMs), leveraging the power of LLMs, enable generating textual summaries directly from speech without intermediate transcriptions, while supporting controllable styles and zero-shot generalization. However, open-source MLLMs continue to lag behind the state-of-the-art text-based LLMs, limiting their practical deployment for speech summarization. In this work, we present a novel multi-stage reinforcement learning training framework to enhance the speech summarization capabilities in MLLMs. Our model delivers substantial improvements over strong baselines, outperforms much larger MLLMs, and significantly narrows the gap with state-of-the-art text-based LLMs.
Improving Long-term Autoregressive Spatiotemporal Predictions: A Proof of Concept with Fluid Dynamics
Data-driven approaches have emerged as a powerful alternative to traditional numerical methods for forecasting physical systems, offering fast inference and reduced computational costs. However, for complex systems and those without prior knowledge, the accuracy of long-term predictions frequently deteriorates due to error accumulation. Existing solutions often adopt an autoregressive approach that unrolls multiple time steps during each training iteration; although effective for long-term forecasting, this method requires storing entire unrolling sequences in GPU memory, leading to high resource demands. Moreover, optimizing for long-term accuracy in autoregressive frameworks can compromise short-term performance. To address these challenges, we introduce the Stochastic PushForward (SPF) training framework in this paper. SPF preserves the one-step-ahead training paradigm while still enabling multi-step-ahead learning. It dynamically constructs a supplementary dataset from the model's predictions and uses this dataset in combination with the original training data. By drawing inputs from both the ground truth and model-generated predictions through a stochastic acquisition strategy, SPF naturally balances short-and long-term predictive performance and further reduces overfitting and improves generalization. Furthermore, the training process is executed in a one-step-ahead manner, with multi-step-ahead predictions precomputed between epochs--thus eliminating the need to retain entire unrolling sequences in memory, thus keeping memory usage stable. We demonstrate the effectiveness of SPF on the Burgers' equation and the Shallow Water benchmark. Experimental results demonstrated that SPF delivers superior long-term accuracy compared to autoregressive approaches while reducing memory consumption. Supplementary dataset update interval Test cases V Flow speed for Burgers' equation h Total water depth including the undisturbed water depth u, v Velocity components in the x (horizontal) and y (vertical) directions g Gravitational acceleration r Spatial euclidean distance ϵ Balgovind type of correlation function L Typical correlation length scale 2 1. Introduction Over many years, scientific research has produced highly detailed mathematical models of physical phenomena[1]. These models are frequently and naturally expressed in the form of differential equations [2], most commonly as time-dependent Partial differential equation (PDE)s.